15 research outputs found
Learning Tractable Probabilistic Models for Fault Localization
In recent years, several probabilistic techniques have been applied to
various debugging problems. However, most existing probabilistic debugging
systems use relatively simple statistical models, and fail to generalize across
multiple programs. In this work, we propose Tractable Fault Localization Models
(TFLMs) that can be learned from data, and probabilistically infer the location
of the bug. While most previous statistical debugging methods generalize over
many executions of a single program, TFLMs are trained on a corpus of
previously seen buggy programs, and learn to identify recurring patterns of
bugs. Widely-used fault localization techniques such as TARANTULA evaluate the
suspiciousness of each line in isolation; in contrast, a TFLM defines a joint
probability distribution over buggy indicator variables for each line. Joint
distributions with rich dependency structure are often computationally
intractable; TFLMs avoid this by exploiting recent developments in tractable
probabilistic models (specifically, Relational SPNs). Further, TFLMs can
incorporate additional sources of information, including coverage-based
features such as TARANTULA. We evaluate the fault localization performance of
TFLMs that include TARANTULA scores as features in the probabilistic model. Our
study shows that the learned TFLMs isolate bugs more effectively than previous
statistical methods or using TARANTULA directly.Comment: Fifth International Workshop on Statistical Relational AI (StaR-AI
2015
Learning and Exploiting Relational Structure for Efficient Inference
Thesis (Ph.D.)--University of Washington, 2015One of the central challenges of statistical relational learning is the tradeoff between expressiveness and computational tractability. Representations such as Markov logic can capture rich joint probabilistic models over a set of related objects. However, inference in Markov logic and similar languages is #P-complete. Most existing tractable statistical relational representations are very limited in expressiveness. This dissertation explores two strategies for dealing with intractability while preserving expressiveness. The first strategy is to exploit the approximate symmetries frequently found in relational domains to perform approximate lifted inference. We provide error bounds for two approaches for approximate lifted belief propagation. We also describe propositional and lifted inference algorithms for repeated inference in statistical relational models. We describe a general approach for expected utility maximization in relational domains, making use of these algorithms. The second strategy we explore is learning rich relational representations directly from data. First, we propose a method for learning multiple hierarchical relational clusterings, unifying several previous approaches to relational clustering. Second, we describe a tractable high-treewidth statistical relational representation based on Sum-Product Networks, and propose a learning algorithm for this language. Finally, we apply state-of-the-art tractable learning methods to the problem of software fault localization
Efficient Belief Propagation for Utility Maximization and Repeated Inference
Many problems require repeated inference on probabilistic graphical models, with different values for evidence variables or other changes. Examples of such problems include utility maximization, MAP inference, online and interactive inference, parameter and structure learning, and dynamic inference. Since small changes to the evidence typically only affect a small region of the network, repeatedly performing inference from scratch can be massively redundant. In this paper, we propose expanding frontier belief propagation (EFBP), an efficient approximate algorithm for probabilistic inference with incremental changes to the evidence (or model). EFBP is an extension of loopy belief propagation (BP) where each run of inference reuses results from the previous ones, instead of starting from scratch with the new evidence; messages are only propagated in regions of the network affected by the changes. We provide theoretical guarantees bounding the difference in beliefs generated by EFBP and standard BP, and apply EFBP to the problem of expected utility maximization in influence diagrams. Experiments on viral marketing and combinatorial auction problems show that EFBP can converge much faster than BP without significantly affecting the quality of the solutions
Efficient Lifting for Online Probabilistic Inference
Lifting can greatly reduce the cost of inference on firstorder probabilistic graphical models, but constructing the lifted network can itself be quite costly. In online applications (e.g., video segmentation) repeatedly constructing the lifted network for each new inference can be extremely wasteful, because the evidence typically changes little from one inference to the next. The same is true in many other problems that require repeated inference, like utility maximization, MAP inference, interactive inference, parameter and structure learning, etc. In this paper, we propose an efficient algorithm for updating the structure of an existing lifted network with incremental changes to the evidence. This allows us to construct the lifted network once for the initial inferenc
Learning Relational Sum-Product Networks
Sum-product networks (SPNs) are a recently-proposed deep architecture that guarantees tractable inference, even on certain high-treewidth models. SPNs are a propositional architecture, treating the instances as independent and identically distributed. In this paper, we introduce Relational Sum-Product Networks (RSPNs), a new tractable first-order probabilistic architecture. RSPNs generalize SPNs by modeling a set of instances jointly, allowing them to influence each other's probability distributions, as well as modeling probabilities of relations between objects. We also present LearnRSPN, the first algorithm for learning high-treewidth tractable statistical relational models. LearnRSPN is a recursive top-down structure learning algorithm for RSPNs, based on Gens and Domingos' LearnSPN algorithm for propositional SPN learning. We evaluate the algorithm on three datasets; the RSPN learning algorithm outperforms Markov Logic Networks in both running time and predictive accuracy